NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?

Bao, Yuwei; Yu, Keunwoo Peter; Zhang, Yichi; Storks, Shane; Bar-Yossef, Itamar; De La Iglesia, Alexander; Su, Megan; Zheng, Xiaolin; Chai, Joyce (November 2023, Findings of Empirical Methods in Natural Language Processing)

Despite tremendous advances in AI, it remains a significant challenge to develop interactive task guidance systems that can offer situated, personalized guidance and assist humans in various tasks. These systems need to have a sophisticated understanding of the user as well as the environment, and make timely accurate decisions on when and what to say. To address this issue, we created a new multimodal benchmark dataset, Watch, Talk and Guide (WTaG) based on natural interaction between a human user and a human instructor. We further proposed two tasks: User and Environment Understanding, and Instructor Decision Making. We leveraged several foundation models to study to what extent these models can be quickly adapted to perceptually enabled task guidance. Our quantitative, qualitative, and human evaluation results show that these models can demonstrate fair performances in some cases with no task-specific training, but a fast and reliable adaptation remains a significant challenge. Our benchmark and baselines will provide a stepping stone for future work on situated task guidance.
more » « less
Full Text Available
DANLI: Deliberative Agent for Following Natural Language Instructions

Zhang, Yichi; Yang, Jianing; Pan, Jiayi; Storks, Shane; Devraj, Nikhil; Ma, Ziqiao; Yu, Keunwoo Peter; Bao, Yuwei; Chai, Joyce (January 2022, EMNLP)

Full Text Available
Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding

https://doi.org/10.18653/v1/2021.findings-emnlp.422

Storks, Shane; Gao, Qiaozi; Zhang, Yichi; Chai, Joyce (January 2021, Findings of Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021)

Large-scale, pre-trained language models (LMs) have achieved human-level performance on a breadth of language understanding tasks. However, evaluations only based on end task performance shed little light on machines’ true ability in language understanding and reasoning. In this paper, we highlight the importance of evaluating the underlying reasoning process in addition to end performance. Toward this goal, we introduce Tiered Reasoning for Intuitive Physics (TRIP), a novel commonsense reasoning dataset with dense annotations that enable multi-tiered evaluation of machines’ reasoning process. Our empirical results show that while large LMs can achieve high end performance, they struggle to support their predictions with valid supporting evidence. The TRIP dataset and our baseline results will motivate verifiable evaluation of commonsense reasoning and facilitate future research toward developing better language understanding and reasoning models.
more » « less
Full Text Available

Search for: All records